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Although approximate Bayesian computation (ABC) has become 
a popular technique for performing parameter estimation when the 
likelihood functions are analytically intractable there has not as yet 
been a complete investigation of the theoretical properties of the re- 
sulting estimators. In this paper we give a theoretical analysis of 
the asymptotic properties of ABC based parameter estimators for 
hidden Markov models and show that ABC based estimators satisfy 
asymptotically biased versions of the standard results in the statisti- 
cal literature. 



1. Introduction. One of the most fundamental problems in statistics 
is that of parameter estimation. Suppose that one has a collection of prob- 
ability laws Pg parametrised by a collection of parameter vectors 9 £ Q. 
Suppose further that one has data Z generated by a process distributed ac- 
cording to some law ¥g* where the exact value of 0* G is unknown. The 
problem of parameter estimation is to infer the value of the unknown param- 
eter vector 6* from the data Z. Many standard methods for estimating the 
value of 0* are based upon using the likelihood function pq(Z). For example 
Bayesian approaches use the likelihood to reweight some prior distribution 
to obtain a posterior distribution on the space of parameter vectors that 
represents ones sense of certainty of any given parameter vector being equal 
to 6*. Alternatively one may take a frequentist approach and estimate 6* 
with the parameter vector which maximises the value of the corresponding 
likelihood (ie. maximum likelihood estimation (MLE)). 

Of course these approaches all rely on one being able to compute the 
likelihood functions pq{Z), either exactly or numerically. However, in a wide 
range of applications this is not possible, either because no analytic expres- 
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sion for the hkehhoods exists or else because computing them is compu- 
tationally intractable. Despite this one is often still able, in such cases, to 
generate random variables distributed according to the corresponding laws 
¥g. This has led to the development of methods in which 9* is estimated by 
implementing a standard likelihood based parameter estimator using some 
principled approximation to the likelihood instead of the true likelihood 
function itself. In general these approximations are estimated using Monte 
Carlo simulation based on generating samples from the relevant probability 
distributions. 

A method which has recently become very popular in practice and on 
which we shall focus our attention for the rest of this paper is approximate 
Bayesian computation (ABC). A non-exhaustive list of references for appli- 
cations of the method includes: [McKinley et al., 2009, Peters et al., 2010, 
Pritchard et al., 1999, Ratmann et al., 2009, Tavre et al., 1997]. See also 
[Sisson and Fan, to be published] for a review on computational method- 
ology. The standard ABC approach to approximating the likelihood is as 
follows. Suppose that the distributions Fq all have a density po (•) on some 
space M™ w.r.t. some dominating measure fi. Furthermore suppose that the 
functions P9 {•) cannot be evaluated directly but that one can generate ran- 
dom variables distributed according to the laws ¥g. Given some data Z the 
general ABC approach to approximating the values of the likelihood func- 
tions pe{Z) is to choose a metric d{-,-) on M™ and a tolerance parameter 
e > and for all G approximate the likelihood pe{Z) with 

(1) fs{Z)^Fe[d{Z,Z)<ey 

Typically the probabilities (1) are themselves estimated using Monte Carlo 
techniques. A particularly appealing feature of the ABC methodology is 
that, despite the methods name, the resulting approximations to the likeli- 
hoods may then be used in any likelihood based parameter inference method- 
ology the user desires. 

Intuitively, the justification for the ABC approximation is that for suffi- 
ciently small e 

-^^n (diZ,Z)<e) ^pe (z) 

where B% denotes the d-ball of radius e around the point Z and thus the 
probabilities (1) will provide a good approximation to the likelihood, up to 
the value of some renormalising factor which is independent of Q and hence 
can be ignored. 
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Clearly in general the estimators based on ABC approximations to the 
likelihood will differ from those based on the exact value of the likeli- 
hood function, however although the use of ABC has become common- 
place there has to date been little investigation of the precise nature of 
the theoretical properties of ABC based estimators. One notable exception 
is [Fearnhead and Prangle, 2010]. In this paper the authors consider the 
problem of finding the optimal choice, for a given data set, of summary 
statistic and e in order to minimise the mean square error of the resulting 
ABC posterior distribution on parameter space. Unfortunately the resulting 
optimal choice of summary statistic involves computing a conditional expec- 
tation w.r.t. the unknown posterior distribution and hence it can only be 
computed approximately and not exactly. Further the analysis is done only 
for fixed size data sets and the asymptotic properties of the ABC estimator 
are left unexplored. 

An alternative approach is taken in [Dean et al., 2010] in which the asymp- 
totic behaviour of the MLE implemented with the ABC approximation to 
the likelihood (henceforth ABC MLE) was studied. The analysis in this pa- 
per is based on the observation that the ABC approximation to the likelihood 
can be considered as being equal to the likelihood function of a perturbed 
probability distribution. Using this observation it was shown that ABC MLE 
in some sense inherits its behaviour from the standard MLE but that the 
resulting estimator has an innate asymptotic bias. Furthermore, it is shown 
that this bias can be made arbitrarily small by choosing a sufficiently small 
values of the ABC parameter e. 

The results in [Dean et al., 2010] concerning the asymptotic behaviour 
of ABC MLE provide a mathematical justification of this method analgous 
to that provided for the standard MLE by the results concerning asymp- 
totic consistency. However they do not establish any asymptotic normality 
type properties of this estimator and there are as yet no analogous results 
for the ABC Bayesian parameter estimator. The aim of this paper is to 
bridge these theoretical gaps by showing that the standard results in like- 
lihood based parameter estimation, that is to say asymptotic consistency, 
asymptotic normality and Bernstein-von Mises type theorems, also hold in 
a suitably modified version for parameter estimators based on ABC approx- 
imations to the likelihood. In the next section we provide an outline of the 
approach that we shall take to proving these results. 

1.1. Contributions and Structure. In this paper we shall study the asymp- 
totic behaviour of ABC parameter estimators when used to perform in- 
ference for hidden Markov models. This will be convenient as (as we will 
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show) the Markovian context imbues the ABC approximations with a par- 
ticularly nice mathematical structure. Furthermore, as HMMs are used as 
statistical models in a wide range of applications including Bioinformatics 
(e.g. [Durbin et al., 1998]), Econometrics (e.g. [Kim et al., 1998]) and Popu- 
lation genetics (e.g. [Felsenstein and Churchill, 1996]) (see also [Cappe et al., 
2005] for a recent overview) , the class of models thus considered is sufficently 
general to be of genuine practical interest. 

For the purpose of this paper a HMM will be considered to be a pair of 
discrete-time stochastic processes, {Xk}f,yQ and {5^fc}fc>o- The hidden pro- 
cess, {Xfc}^>g, is a homogenous Markov chain taking values in some Polish 
space X and the observed process {^fc}fc>o takes values in for some 
m > 1. Conditional on Xk the observations are statistically independent 
of the random variables Yq, . . . , yfc_i; Xq, . . . , ^fc-i- In many models the den- 
sities of the conditional laws of the observed process w.r.t. the hidden state 
either have no known analytic expression or else are computationally in- 
tractable. In this case it follows that standard methods to estimating the 
likelihoods of the observed process, eg. SMC, can no longer be used and that 
an alternative approach like ABC must be used. For the rest of this paper we 
shall consider performing ABC based parameter estimation for HMMs using 
the following specialization of the standard ABC likelihood approximation 
(1), proposed in [Jasra et al., 2010], for when the observations are generated 
by a HMM. Specifically, given a sequence of observations Yi, . . . ,Yn from 
a HMM, we shall approximate the corresponding likelihood functions with 
the probabilities 



where for all y £ M™, By denotes the ball of radius e centered around the 
point y. The benefit of this approach is that it retains the Markovian struc- 
ture of the model. This facilitates both simpler Markov chain Monte Carlo 
(MCMC) (e.g. [McKinley et al., 2009]) and sequential Monte Carlo (SMC) 
(e.g. [Jasra et al., 2010]) implementation of the ABC approximation. Fur- 
thermore the resulting approximation has a structure which is particularly 
tractable to mathematical analysis. 

The purpose of this paper is to show that one can prove results about 
the asymptotic behaviour of ABC based parameter estimators analogous to 
the standard results in the literature concerning the asymptotic behaviour 
of estimators based on the exact value of the likelihood. In particular we 
show that one can develop a theoretical justification of ABC parameter es- 
timation procedures based on their large sample properties analogous to 
those provided for Bayesian and maximum likelihood based procedures by 
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the standard Bernstein- von Mises and asymptotic consistency and normality 
results respectively. Our approach is based on the observation in [Dean et al., 
2010] that ABC can be considered as performing parameter estimation using 
the likelihoods of a collection of perturbed HMMs which suggests that in 
some sense ABC based parameter estimators should inherit their behaviour 
from the standard statistical estimators. We first show that unlike the MLE, 
which is asymptotically consistent, the ABC MLE estimator has an innate 
asymptotic bias in the sense that the value of the estimator converges to 
the wrong point in parameter space as the number of observations tends to 
infinity. Moreover we show that asymptotically the ABC MLE is normally 
distributed around this biased estimate. Secondly we show that the result- 
ing ABC Bayesian posterior distributions obey a Bernstein-von Mises type 
theorem but that the posteriors are again asymptotically biased in the sense 
that as the number of data points goes to infinity the resulting posterior 
distributions concentrate about the limit of the ABC MLE rather than the 
true parameter value. Finally we show that the size of the asymptotic bias 
of both the ABC Bayesian and ABC MLE estimators goes to zero as e tends 
to zero and under mild regularity conditions we obtain sharp rates for this 
convergence. Together these results show that ABC based parameter esti- 
mates are asymptotically biased with a bias which can be made arbitrarily 
small by taking a suitable choice of e and thus provide a rigorous justifica- 
tion for performing statistical inference based on ABC approximations to 
the likelihood. 

We note that the results in this paper extend those in [Dean et al., 2010] 
in several ways. In particular we provide a much sharper analysis of the 
ABC MLE than that contained in [Dean et al., 2010]. The crucial difference 
between the current paper and [Dean et al., 2010] is that it is not possible 
using the techniques of [Dean et al., 2010] to show that the ABC MLE has 
a unique limit point. In contrast, in this paper we show that for sufficiently 
small values of e the ABC MLE has one and only one limit point. This 
then enables us to extend the scope of the analysis in [Dean et al., 2010] to 
include asymptotic normality results for the ABC MLE and Bernstein-von 
Mises type results for ABC based Bayesian estimators. 

This paper is structured as follows. In Section 2 the notation and assump- 
tions are given and in Section 3 we present our main results concerning the 
asymptotic behaviour of ABC. The article is summarized in Section 4 and 
supporting technical lemmas and proofs of some of the theoretical results 
are housed in the four appendices. 

2. Notation and Assumptions. 
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2.1. Notation and Main Assumptions. Throughout this paper we shaU 
use lower case letters x, y, z to denote dummy variables and upper case let- 
ters X, Y, Z to denote random variables. Observations of a random variable, 
i.e. data, will be denoted by Y. Given any e > and y E M*" we shall let By 
denote the closed ball of radius e centered on the point y and let Ub^^ denote 
the uniform distribution on By. For any A C the indicator function of 
A will be denoted by Ia- 

In what follows we need to refer to various different scalar, vector and 
matrix norms. Given a scalar z and a vector a we shall let \z\ and \a\ de- 
note the standard Euclidean scalar and vector norms respectively and for 
any matrix M we shall let ||M|| denote the Probenius norm. We note that 
although using |-| to denote multiple norms is an abuse of notation there 
is in practice no loss of clarity as the precise meaning of these terms will 
always be made clear by the context in which they are used. 

For any vector of variables a we shall let Vq denote the gradiant operator 
with respect to a. Moreover given vectors of variables a,b,c of dimensions 
di,d2 and d^, we shall let V^Vh and VaV^Vc denote the di x c?2 and di x 
d2 X ^3 matricies of partial derivatives with entries given by and qJ^I.^^ 

respectively. Further, for any vector of variables a we shall let and Vj^ 
denote V a and VaVaVa respectively. Further given vectors u, v, w we 
shall \ei u*v and u*v*w denote the outer products of u, v and u, w and 
u*"^ and u*^ denote the outer products u*u and u* u* u respectively. 

It is assumed that for any HMM the hidden state {X^j^^Q is time- 
homogenous and takes values in a compact Polish space X with associated 
Borel (T- field B (Af). Throughout this paper it will be assumed that we have a 
collection of HMMs all defined on the same state space and parametrised by 
some parameter vector 9 taking values in a connected compact set G G M"^. 
Furthermore we shall reserve 6* to denote the 'true' value of the parameter 
vector 6. For each 6 £ Q we shall let Qg (x, •) denote the transition kernel of 
the corresponding Markov chain and for each x € X and 6 € Q we assume 
that Qo (x, •) has a density qe {x, •) w.r.t. some common finite dominating 
measure /u on Af. The initial distribution of the hidden state will be denoted 
by ttq. 

We also assume that the observations {^fc}fc>o take values in a state space 
y C for some m > 1. Furthermore, for each k we assume that the 
random variable Y^ is conditionally independent of ... , X^^i; X^^i, . . . and 
. . . , Yj._i; Ifc+i, . . . given X^. and that the conditional laws have densities 
90 iy\x) w.r.t. some common cj-finite dominating measure u. We further as- 
sume that for every 9 the joint chain {X^, Y^j^^Q is positive Harris recurrent 
and has a unique invariant distribution irg. For each 6 £ Q we shall let ¥g 
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denote the law of stationary distribution of the corresponding HMM and Eg 
denote expectations with respect to the stationary distribution ¥g. 

We shah frequently have to refer to various kinds of both finite, infinite 
and doubly infinite sequences. For brevity the following shorthand notations 
are used. For any pair of integers k <n, Y^-n denotes the sequence of random 
variables Y^, . . . Y^oo-.k denotes the sequence . . . ,1^; ^n:oo denotes the 
sequence y„, . . . and Y^oo:k;n:oo denotes the sequence . . . , Yk;Yn, ■ ■ ■■ Further 
given a measure ^ on a Polish space X we let J ■ ^{dxi-n) denote integration 
w.r.t. the n-fold product measure fi^^ on the n-fold product space X^. 

For any two probability measures /ii, /U2 on a measurable space {E, £') we 
let ll/Ui — //2||Ty denote the total variation distance between them. For all 
p £ [1,00) we let Lp{fi) denote the set of real valued measurable functions 
satisfying J \ f{x)\^ fi{dx) < 00. 

Finally we note that when writing the likelihood peiYi, . . . ,Yn) of a se- 
quence of observations Yi, . . . , y„ we shall typically suppress the dependence 
of the likelihood function on the the initial condition of the hidden state of 
the process unless we specifically need to refer to it in which case we shall 
write the likelihood as peiYi, . . . , YnlXo = x). 

2.2. Particular Assumptions. In addition to the assumptions above, the 
following particular assumptions are made at various points in the article. 

(Al) The parameter vector 9* belongs to the interior of Q and ^ = ^* if and 
only if Fe{. . . , F-i, Yo,Yi, . . .) = Pg* (. . . , Y_i,Yo, Yi, . . .). 

(A2) For all y £ y, x,x' £ X, the mappings 9 — )• qg{x,x') and 9 — )■ x) 
are three times continuously differentiable w.r.t. 9. 

(A3) There exist constants C]^,ci G (0, 00) such that for every y £ y, x,x' £ 

x,9£e 

,^ ci < qe{x,x') <ci, 

9e{y\ x) < ci. 

(A4) There exists a constant C2 G (0, 00) such that for every y £y, x,x' £ 
\Vglogqg{x,x')\ , \Vjlogqe{x,x')\ < C2. 

(A5) For all9ee 
(4) 

for all y £ y. 



< 



gg {y\x)fi{dx) < 00 
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(A6) For any A' > 



(5) 



Ep* 



sup sup sup \\V log gg {Y + z\x) 



sup sup sup II Vg log (y + 



Eg* 



sup sup sup ||Vglog(76» (y + -2|a;)|| 



< oo. 



Remark 1. Assumptions (Al)-(A6) are similar to those used in [Done et 
2004] to prove consistency of the MLE for HMMs. We use similar assump- 
tions in this paper as, broadly speaking, our approach will he to show that 
the ABC parameter estimators inherit their properties from standard sta- 
tistical estimators. However the methods and emphasis of this paper differ 
from those in [Douc et al, 2004] ^.''^d as a result the assumptions we re- 
quire have a slightly different flavour. In particluar we shall require slightly 
stronger conditions on the differentiability of the conditonal densities gg(y\x) 
but slightly weaker conditions on their integrability. 

Remark 2. In general assumptions (A3)-(A6) will hold when the state 
space X is compact. However we expect that the behaviours predicted by 
Theorems 2, 3, 4 (ind 5 will provide a good qualitative guide to the behaviour 
of ABC MLE in practice even in cases where the underlying HMMs do not 
satisfy these assumptions. 

3. Approximate Bayesian Computation. 

3.1. Structure of ABC Estimators. Suppose that a collection of HMMs 

(6) {^fc,n}fc>o 

parameterised by some 9 £ Q are given. For any sequence of observations 
Yi, ... ,Yn for £ @ let pg{Yi, . . . , Yn) denote the likelihood of the observa- 
tions under the corresponding HMM (6). Following [Jasra et al., 2010] we 
consider approximating pg(Yi, . . . , Yn) by the ABC approximation, 



Y^ G B'- 



,Yn£B 



T] 'le{xk-i,Xk)lB^, {yk)ge{yk\xk) 



fc=l 



■no{dxo) n{dxi.,n)y{dyi;n). 



(7) 
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The purpose of this paper is to analyse the asymptotic properties of hkeh- 
hood based parameter estimators implemented using the ABC approximate 
likelihoods (7). The key to our analysis is the following observation, see 
[Dean et al., 2010] for more details; 



(8) 

where 

(9) 



'k\Xk) 



k=l 



■Ko{dxo) ^j.{dxi.,n)y{dy 



oc 



JJ qe{xk-~i,Xk)gl{Yk\xk) 



k=l 



7ro(dxo)/u(dxi:„,] 



I 9e{y'\x)v{dy') 



The crucial point is that the quantity g0{y\x) defined in (9) is the density 
of the measure obtained by convolving the measure corresponding to gQ{y\x) 
with Ub^ where the density is taken w.r.t. the new dominating measure 
obtained by convolving u with ■ One can then immediately see that the 
quantities qg{x,x') and gg{y\x) appearing in (8) are the transition kernels 
and conditional laws respectively for a perturbed HMM {X^, Y'^'^j^^Q defined 
such that it is equal in law to the process 



(10) 



{Xk,Yk + eZk} 



k>0 



where {Xk,Yk}k>o is the original HMM and the {Zk}k>o i.i.d. se- 

quence of Uqi distributed random variables. 

3.2. Theoretical Results. It follows that performing statistical inference 
using the ABC approximations to the likelihood is equivalent to performing 
inference using a misspecified collection of models. It is well known (see for 
example [White, 1982]) that this will in general lead to biased estimates of 
the true parameter value. In the rest of this paper we shall investigate the 
theoretical consequences of this for ABC based parameter estimators. 

We start by showing that almost surely the ABC MLE will converge, with 
increasing sample size, to a given point in parameter space that is not equal 
to the true parameter value (more generally the set of accumulation points 
will belong to a given subset of parameter space) and hence that the ABC 
MLE is asymptotically biased (Theorem 2). Further, we show that these 
accumulation points must lie in some neighbourhood of the true parameter 
value and that the size of this neighbourhood shrinks to zero as e goes to 
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zero. Next we show that for sufficiently small values of e the ABC MLE has 
a unique limit point and that asymptotically the ABC MLE is normally dis- 
tributed about this point with a variance that is proportional to ^ (Theorem 
3). Third we show that aymptotically the ABC Bayesian posterior converges 
to that of a Normal random variable, centered on the location of the ABC 
MLE and with variance again proportional to ^ (Theorem 4). Finally we 
show that under certain Lipschitz conditions one can obtain a rate for the 
decrease in the size of the asymptotic bias of the ABC parameter estimators 
(Theorem 5). 

These results show that the error of ABC based parameter estimators may 
be decomposed into two parts. A bias component whose size depends on e 
and a variance component whose size is proportional to Furthermore 
they show that the size of the bias can be made arbitrarily small by a 
suitable choice of e. Thus taken together the results show that the accuracy 
of estimators based on ABC approximations to the likelihood can be made 
to be arbitrarily close to that of estimators based on the exact value of the 
likelihood, providing a rigourous mathematical justification for the ABC 
methodology. 

We note that there are two important technical issues that arise in the 
proofs of these results. Firstly, as noted in [Dean et al., 2010], one cannot 
simply analyse the behaviour of the ABC MLE by extending the parameter 
space to include e and then applying standard results from the theory 
of MLE because the perturbed likelihoods gg{y\x) are in some sense insuf- 
ficiently continuous. Instead one has to establish that in some sense the 
Lebesgue differentiation theorem still holds upon taking asymptotic limits. 

Secondly we note that because the dominating measures of the original 
and perturbed HMMs are no longer necessarily mutually absolutely continu- 
ous with respect to each other we can no longer take the standard approach 
to analysing likelihood based estimators by studying the limits of 



and interpreting them in terms of Kullback-Leibler distances. To avoid this 
problem we instead show that for any e the relative mean log likelihood 
surfaces (considered as functions of 9) 



almost surely converge to some limiting surface l''{0). The behaviour of ABC 
based parameter estimators can then be understood by examining the be- 
haviour of the corresponding limiting log likelihood surfaces. The key result 
in doing so is the following whose proof is deferred until Appendix B. 



lim -logpe(Yi, . . . ,y„) 
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Theorem 1. Suppose that one has a collection of HMMs parameterized 
by some parameter vector 6 G that satisfy assumptions (A1)-(A6). For any 
e > let pg{- ■ ■) denote the likelihood function w.r.t. the perturbed HMMs 

(10) (and where by definition we let Pq[- ■ ■ ) denote the likelihood function of 
the original HMM (6) J. Let data Yi, . . . ,Yn generated by the HMM corre- 
sponding to an unknown parameter vector 9* be given. Then for every e > 
there exists a twice continuously differentiable function l^ (9) : Q ^ M such 
that for all X £ X one has that Fq* a.s. 

(11) ^ 

i (logpUYu...X\Xo 

-Ve (logpUYi,...,Yn\Xo 
n V 

-Vl (logpUYi,...,Yn\Xo 

n \ 

uniformly in 9. 

Furthermore l^ (9) ,Vel%Vjl^ /° (6*) , Ve^^, V|/° as e ^ 0, where the 
convergence is again uniform in 9. 

We can now use Theorem 1 to analyse ABC based parameter estimators 
by comparing their the asymptotic behaviour (encapsulated in the surfaces 
r(0)) to the asymptotic behaviour of estimators based on using the true 
value of the likelihood (which is encapsulated in the surface l^{9)). we shall 
start by analysing the behaviour of the ABC MLE which we formally define 
below. 

Procedure 1 (ABC MLE). Given e > and data Yi,... ,Yn, estimate 
9* with 

(12) 0^ = argmaxPe (Yi e B'- , . . . ,Yn G B'~) . 

Using Theorem 1 we can now establish the following biased asymptotic 
consistency and normality type properties of the ABC MLE whose proofs 
are deferred to Appendix C. 

Theorem 2. Suppose that one has a collection of HMMs parameter- 
ized by some parameter vector 9 £ Q that satisfy assumptions (Al)-(A6). 
Let data li , . . . , 1^ generated by the HMM corresponding to an unknown 
parameter vector 9* be given and suppose that we use the ABC MLE to es- 
timate the value of 9* . Then for every e > there exists a collection of sets 



x) - \ogpUYi, Yn)\Xo = ^ r (9) 
x) - logpl^Yi, . . .,Yn\Xo = x)) ^ Vgl' (9) 
x) - \ogpl.{Yi, ...X)\Xo = x)^ V^f {9) 



12 



DEAN ET AL. 



T*^ such that for all initial conditions Xq the set of accumulation points of 
the ABC MLE 6^ lies fg, a.s. in and 

(13) lim sup |6l - r| = 0. 

Furthermore let P{9) he as in Theorem 1. IfVgl^ (6*) is strictly negative def- 
inite then for sufficiently small values of e the set T*^ consists of a singleton 

Remark 3. The quantity —Vgl^ (6*) is equal to the asymptotic Fisher 
information I of the HMM. For more details see [Done et al. , 2004 ]■ 

Theorem 3. Suppose that one has a collection of HMMs parameterized 
by some parameter vector 9 £ Q that satisfy assumptions (Al)-(A6) and 
that Vg/'^ {9*) is strictly negative definite where l^{9) is as in Theorem 1. 
Let data Yi, . . . ,Yn generated by the HMM corresponding to an unknown 
parameter vector 9* be given and suppose that we use the ABC MLE to 
estimate the value of 9* . Then for sufficiently small values of e there exists 
strictly positive definite matricies Je,L^ such that Fq* a.s. 

(14) (9n,e - r' j ^ n{o,l:^jj~^). 

Furthermore J^,I^ ^ I as e — t- where L is as in Remark 3. 

Next we consider the properties of the ABC Bayesian parameter estimator 
which we define below. 

Procedure 2 (ABC Bayesian Estimator). Given e > a prior distri- 
bution ttq and data Yi, . . . , y„ estimate 9* via the ABC posterior 

(15) < ^Fg(YiGB'^^,...,Yn£ B'^J ttq. 

Given Theorem 1 we can easily see that the ABC Bayesian estimator sat- 
isfies the following Bernstein- Von Mises type theorem, see [Borwanker et al., 
1971] whose proof is again deferred to Appendix C. 

Theorem 4. Suppose that the assumptions of Theorem 3 hold and that 
one tries to infer the true value of 9* using the ABC approximate Bayesian 
posterior (15). Suppose further that the prior distribution has a continuous 
density w.r.t. Lebesgue measure, then for sufficiently small values of e one 
has that ¥g* a.s. 

(16) < (^V^{9 - 9n,e)) ^ N (0, 1,-1) 
where is as in Theorem 3. 
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3.3. Asymptotic Rates of Convergence. Theorems 2, 3 and 4 show that 
asymptoticahy ABC based parameter estimators concentrate around a point 
Q*,e 0* g^j^^ that the asymptotic bias will be of order \6*''^ — 6*\. It is 
natural to ask at what rate does 0*'*^ — )■ 0* as e — )• 0. We begin our answer 
to this question with the following example. 



Example 1. Let vri he the distribution on the set of diadic numbers 
of the form ^; A; = 0, 1, . . . given by 7ri(^) = for all k and let 1x2 

be the distribution on the set of diadic numbers of the form given by 

7^2(2^) = WfT k = 0,1,.... Furthermore let {vre}ee[o.25,o.75] the 

set of distributions defined such that for all 6, ttq = Otti + (1 — 0)-K2. 

It is clear that the distributions -Kg satisfy the conditions of Theorem 1 and 
hence that for any e the limiting approximate mean log likelihood surface l^{0) 
exists and is well defined. Further if we assume that the true value of the 
parameter is equal to 9* = ^ then it is easy to show that Vgl^{d*) and 

that for all k > that V qI^^ (6*) = from which it follows that 



1 

'4fe+l 



+ 



The above example shows that in the general case one should expect that 
the size of the asymptotic bias will be at least 0(e). The next theorem shows 
that the behaviour of the asymptotic bias will be no worse than this. In order 
for it to hold we need to make the following Lipschitz assumptions. 

(A7) There exists some R > such that for all e < R. 



(17) 



Eg* 



Eg* 



sup sup sup 



V,ge {Y + z\x) 



ge {Y\x) 



sup sup sup 

x(^X 6*60 zdBO 



V, {Vggg {Y + z\x)) 



ge {Y\x) 



< 00. 



Theorem 5. Suppose that in addition to all of the assumptions of The- 
orem 4 one has that assumption (A7) above also holds. Then 



(18) 



0(6). 



Moreover, if the dominating measure i' is Lebesgue measure then one can 
show, under slightly stronger Lipschitz assumptions, that the asymptotic 
error in the ABC parameter estimate is of order 0{e)'^. 
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(A8) There exists some R > such that for all e < R. 



Ep* 



(19) 



sup sup sup 



Vlge {Y + z\x) 



ge {Y\x) 



sup sup sup 

x(^X 6*60 ze-B? 



V2 iVege {Y + z\x)) 



ge {Y\x) 



< oo. 



Theorem 6. Suppose that v is Lebesgue measure and that in addition 
to all of the assumptions of Theorem 5 one has that assumption (A8) above 
holds also. Then 



(20) 



0(e2 



The proofs of Theorems 5 and 6 are deferred to Appendix D. Finally we 
note that in the case that v is Lebesgue measure we have from Theorems 3 
and 4 that the variance of ABC based based estimators is of order 0(l/-^/n) 
while their bias is of order O(e^). It follows that (at least in theory) it is 
optimal to scale e as 0{l/^fn) as n goes to infinity. Intriguingly this is the 
same rate as the optimal bandwidth in kernel density estimation (see for ex- 
ample [Wand and Jones, 1995]). This suggests an alternative interpretation 
of ABC as approximating the likelihood via a kind of kernel density based 
estimate. 

4. Summary. In this paper we have shown that the framework devel- 
oped in [Dean et al., 2010] to analyse the behaviour of the the ABC MLE 
can be extended to provide a rigourous analysis of the behaviour of ABC 
based estimators in both the Bayesian and frequentist contexts. In particular 
we have shown that ABC based parameter estimators satisfy results anal- 
ogous to the asymptotic consistency, asymptotic normality and Bernstein- 
von Mises theorems for standard parameter estimators but that the ABC 
estimators are asymptotically biased. Furthermore we have shown that this 
asymptotic bias can be made arbitrarily small by choosing a sufficiently 
small value of the parameter e. Together these theoretical resultshelp to 
solidify and extend existing intuition and provide a rigourous theoretical 
justification for ABC based parameter estimation procedures. 

Appendix A: Auxiliary Results. In this section we present without 
proof some well known results that will be needed in the proofs of Theorems 
1, 2, 3, 4 and 5. The first two lemmas are standard result from real analysis. 
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Lemma 1. Let a connected compact set G C M" and some constant 
K > be given. Suppose that there exists a continuous function / : G — t- 
and sequence of continuous functions fn '■ G ^ W , n > 1, such that for all 
n the function fn is Lipschitz-K continuous. Then fn^f uniformly in G 
if and only if fn ^ f pointwise on a countable dense subset of G. 

Lemma 2. Let a connected compact set G C be given and suppose that 
there exists a continuous function g : G ^W" and sequence of continuously 
differentiable functions fn '■ G ^ W" , n > 1, such that V fn{z) — > g{z) 
uniformly in z and fn{z*) is Cauchy for some z* G G. Then there exists 
a uniformly bounded and continuously differentiable function f such that 
fn{z) — 7" f{z) uniformly in z and V f{z) = g{z). 

Lemmas 3, 4 and 5 are essentially corollaries and extensions of Proposi- 
tions 4 and 5 in [Douc et al., 2004] and may be proved in exactly the same 
manner. We leave the details to the reader. 



Lemma 3. Suppose that one has a collection of HMMs parameterised 
some vectors 9 & Q that satisfy assumption (A2). Furthermore suppose that 
one has a HMM {X^, y^j^^-i^, defined on the same state spaces as the pa- 
rameterised collection of HMMs, which satisfies assumption (A 2) with the 
same values of c and c. 

Given measurable functions cl)i,(j)2,4'3 '■ 6 x x 3^ — M and y ^ y, k < I 
and s G {1, 2, 3} define the following functions of the HMM {X^, ^k}k>i 

I 

i=k+l 

and for any n > define the random variables Ao,n, ^o,n, ^o,n o-nd Oo.n hy 



Ao,n(^) = Ee 



>l:-n:0 



n:0 



^l;_,:_l(e)|y_„;_ 



To,n{G)=Ee (Al;-n:0W</'2;-„:0W|l0;-n 



Eq 



n\-n:~l 



!;-n:-l(^)|^-n:-l 



Ee 



n-~n:~l 



{e)\Y^n:^ 



Ee 



h;^n:~l{0)\Y^n:^l 



Eq 



01;-n:o(^)|^-n:-O Ee 02;~n:o(^) |^-n:-0 
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:o{ff)\YQ.^n Eg 03;_„:o(6') y_„:_o 



-Ee 
+ Ee\(j) 

— Eg 

and 

^o,n{0) = Ee 



4'l;~n:o{(^)\y~n:-0 Ee (/)2;_„:o(6')r5^-n:-0 Ee cj)s-^n:o{G)\Y^n:-0 



l:-n:-l 



{e)\Y. 



n:-l 



Ee 



Ee 



(/.3;_n:_l(^?)|y_n;-l 



il;_„:_l(0)</-2;-n:-lW|y-n;-l Eg (/.3;-n:-l W l^-n;-! 



(l>l--n:o{0)(l>2--n:Q{0)(t)?,-^n-.Q{0)\y~n-.^G 



Ee 



01;-n:o(^)P^-n:-O Ee 02;-n:o(^)P^-n:-O Ee (/)3;_„:o(^) y-n.:-0 



+ Ee 



n:-n:-l 



{e)\Y. 



n:~l 



Ee 



Ee 



>l;~n:-l 



'2;-ra:-l 



•■i--n:-l{G)\Y-ri:-l 



Then there exist aiY-ao-.o) measurable random variables Ao,oo(^), ro,oo(^); 
^0,00 (^) o-'^d rio,oo(^) o,nd constants C < 00 and < p < 1 which depend 
only on c and c such that for any initial condition on the collection of pa- 
rameterised HMMs 



E 



SUp|Ao,nW - Ao,ooWI 



< Cp^'E 



(A-21) 



E 
E 
E 



sup|ro,„(6') - ro,oo(6')| 

.6*66 

SUpl^-o.nl^) - ^0,00(^)1 

,6>e0 

sup |$7o,n(6') - ^^0,oo(6')| 

.6»ee 



<C/)" sup E 

sG{l,2} 

< Cp" sup E 

s6{l,2,3} 

< C/o" sup E 

se{i,2,3} 



|3 
loo 



where for all s G {1, 2, 3} 



ll'/'slloo (y) - sup sup \(l)s{e,x,x ,y)\ 
9eBx,x'GX 

E[-] denotes expectation w.r.t. the law and stationary law respectively of the 
process {Xk,Yk}^y-^. 

Lemma 4. Suppose that the assumptions of Lemma 3 all hold. Then 
there exist constants C < 00 and < p < 1 such that for any initial condition 
on the collection of parameterised HMMs 



(A-22) 



E 



sup|Ao,„(^)-Ao,ooWr 



< Cp^E 
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Lemma 5. Let the same assumptions and notation as Lemma 3 he given. 
Then there exist constants C < oo and < p < 1 such that for any k, n 



(A-23) 
E 



^[Ao,„(e)|y- 



oo: — fcj 



^[Ao,„(0)|y_oo:-fc-l] 



< Cp^E 



^iIlL 



where E[-\-] denotes conditional expectation w.r.t. the law of the process 
{Xk-,yk}k>i- 

The last Lemma is a statement of the Fisher identity and the Louis missing 
information principle (see for example [Douc et al., 2004]) plus an extension 
of these results to third order derivatives of the log likelihood function. Given 
assumptions (A2)-(A6) it follows from a simple application of the dominated 
convergence theorem. 

Lemma 6. Suppose that assumptions (A2)-(A6) hold for a collection of 
HMMs parametrised by some vector 6 £ @ where for each 9 £ Q we let 
9e iy\x) and qg {x',x) denote the densities of the conditional law and transi- 
tion kernel of the corresponding HMM. For any e > let {y\x) denote the 
density of the conditional law of the corresponding perturbed HMM (10). By 
convention we let {y\x) = gg {y\x). 

For any G ©, e > and n > let ip{9, x, x' , y) = log {y\x') qe {x, x') 
and following the notation of Lemma 3 let ipn{9) = Y17=i''P(^^ ■^i-'^^ -^i^^)- 
Then one has that for any 6 € Q and e > the log ABC approximate 
likelihood function logpg(- ■ ■) is three times differentiable and 



(A-24) 



VelogpKYi,...,Yn) = Ee^ [VeMO)\Yi, 



(A-25) 



Vl-logpl{Yi,...,Yn) 



and 



n 



Ee^ [Vli^nlYl-.n] + Ee. (VeVn) \Yl:n - Ee^ [VeVnl^: 



1 *2 



Vi- logp^(yi, . . . ,y„) = Ee^ [ViVn|>^l:n] 

+ 3Ee^- [V^*^nVeVn|>l;n] - ^Eg. [Vltn\Yl:n] * Eg. [Ve^nl^bn] 
- 3Eg. \{VgiJnT^ |ll;J * Eg. [V g^Pn\Yl:n] + Eg. [(VgV'n)*^ 1^: 



(A-26) +2Ee.[VgTPn\Yl:n 



*3 
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where Egt [•!•] denotes conditional expectation w.r.t. the law of the perturbed 
HMM (10). 

Appendix B: Proof of Theorem 1. Theorem 1 is an immediate corol- 
lary of the following three lemmas. 



Lemma 7. Suppose that assumptions (Al)-(A6) hold for a collection of 
HMMs parametrised by some vector 6 £ Q. Then for any e > there exists 
a twice continuously differentiable function l^ (9) such that 
(B-27) 



lim sup 



n 



{\ogpl{Y^,...,Yn 



lim sup 
lim sup 



n 



Ve ilogfeiYi,...,Yn 



-Vl ilogpUYu...,Yn)-logpl,{Y, 

n 



iogp^.(yi,...,yO)-^' W 

\ogp'e,{Yi,...,Yn))-Vel' {0) 

.,y„))-v^f {9) 












jPe* a.s. and in Li (Pg*) where for all 6 and e, pg{- • • ) denotes the likelihood 
function of the perturbed HMM (10). By convention we define Pg(- • • ) to 
be equal to the true likelihood function pe{- ■ ■). Moreover there exists some 
constant < K < oo such that for all 9 £ Q and e > 



(B-28) 



r {9),Vel' (0),V^r {9)<K 



and (9) , Vgl^ (9) , V^/'^ (9) are K-Lipschitz ( as functions of 9 ). 

Lemma 8. Suppose that assumptions (A1)-(A6) hold for a collection of 
HMMs parametrised by some vector 9 £ Q and for any e > let I'' {9) be 
equal to the corresponding limit function defined in Lemma 7. Then for all 
9 £ Q one has that 

limVgf (9) = Vgf (9). 

e-S>0 

Lemma 9. Suppose that assumptions (A1)-(A6) hold for a collection of 
HMMs parametrised by some vector 9 £ @ and for any e > let T (9) be 
equal to the corresponding limit function defined in Lemma 7. Then for all 
9 £ Q one has that 



lim V r {9) = Vil'> (9). 

e-s>0 

In order to complete this section we need to provide the proofs of Lemmas 
7, 8 and 9. We start by stating some properties of the perturbed conditional 
likelihood (9) that will be needed in the sequel. First note that it follows 
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from assumptions (A2) and (A5) and a simple application of the dominated 
convergence theorem that 



(B-29) 



fn^ ^e9e {z\x) v{dz 

V 



and that VqQq {y\x) is continuous w.r.t. 6 for all e, x and y. Furthermore 
since 



/ Vggg {A^) ^{dz) < sup sup sup [ ^^^^ (^1^) j ^ f 
JBi eee xex z£B':- \ 9e[z\x) j J bi 



{z\x) V (dz) 



it follows from (B-29) and assumption (A5) that for any e > 



(B-30) 



Eg, 



sup sup II loggl {Y\x) 



< oo. 



Finally we note that analogous comments hold for {y\x) and V 
We now proceed to the proof of Lemma 7. 



2„e 



X . 



Proof of Lemma 7. First note that for any n the gradient of the mean 
log ABC likelihood may be decomposed into the following telescoping sum 



(B-31) 



1 1 
n n 'f— ' 



i=l 



where for any k < n 

(B-32) hliVk-.n) := Ve logpliYk, . . . , y„) - V, logp^(n, . . . , y„_i). 

It then follows from (A-21) and (A-24) that there exist constants K < oo 
and < p < 1 such that for all € 0, e > and n > there exists some 
a{Y-oo:o) measurable random variable RgiY^oo-.o) such that 



(B-33) 



sup 

k>n 



h'eiY-n-.o) - Rl{Y- 



co:Oy 



We note that by (B-29) and (B-30) and the accompanying comments and 
the dominated convergence theorem that Eq* [/ig(y_„:o)] is continuous for all 
n and hence by (B-33) that Eg* [i?g(y_oo:o)] is continuous. Further it then 
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follows from (B-33) and two applications of the ergodic theorem that for any 
m > 



lim sup 



< lim sup 

n—^oo 



+ lim sup 



(B-34) 

and 
(B-35) 



+ lim sup — 



1 " 

-^mn.i) - Ee* [RliY^oo-.o] 
:=i 

^ m 

-Y,KiYld-Eg.[R^s{y-oo:0)] 

n ^ — ' 

1 " 

n ^ — ^ 



i=m+l 
n 



i=m+l 



sup 

fc>0 



lim sup sup 



^ n 1 

<limsup-J^ 



sup sup 



i=l 



hKYk:i) 



< K. 



Thus we have by (B-31), (B-34) and (B-35) that Pg. a.s. 



(B-36) 



1 



Ve- logpliYi, . . . , y„) ^ Ee* [Rl{Y^^.,^)\ 



n 



pointwise in 9 for some continuous in e function Eq* [R^iY. 

oo:o) and that 

|Ve^ logp^(Yi, . . . ^Yn)\ is eventually uniformly bounded above by K. 

Moreover it follows from (A-21), (A-25) and (A-26) and a similar argu- 
ment as above that for any ^ £ and e > there exist aiY-oo-.o) measurable 
random variables 5g(y_oo:o) and Tg(y„oo:o) such that Eq* [S'g(y_oo:o)] and 
Eg* [rg(y_oo:o)] are continuous functions of 9, that 



(B-37) 



V^-logp^(yi, . . . ,F„,) ^ Ee^ [Sl{Y^oo:o)] , 
n 



V3ilogp^(Fi, 



,Yn) 



Ee> [r|(y_oo;o)] 



¥e* a.s. and in (Pg*) and that P^. a.s. eventually |V^i logp^(yi, . . . ,Yn)\ 
and |Vg^ logpg(li, . . . , y„)| are both uniformly bounded above by K. Since 
the fact that jV^i logp^(yi, . . . , y„)| and | V^i logp^(yi, . . . , y„)| are both 
uniformly bounded above implies that both | Vg- logpg(yi, . . . , 1^,)| and 
|Vg^ logpg(yi, . . . , y„)| are Lipschitz the result now follows from Lemmas 1 
and 2. □ 
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In remains to prove Lemmas 8 and 9. Since the proofs of these two lemmas 
are almost identical we prove only Lemma 8 and leave the details of the proof 
of Lemma 9 to the reader. 



Proof of Lemma 8. It follows from (B-31) and (B-33) that in order to 
prove the result it is sufficient to show that 



(B-38) lim Eg 



e-!>0 



-Vg\ogpl{Yi,...,Yn) 
n 



Eg* 



-Ve^ogp0{Yi,...,Yn 

n 



for all n and 6 and hence by (A-24), (B-29) and (B-30) and the accompanying 
comments and the dominated convergence theorem that 

Ee^ [Ve (log 5^ (nl^fc) qe {Xk-u Xu)) \Yi,n] 
(B-39) = Ee [Ve ilogge (ni^fc) qe 

Pg* a.s. for all 9 and 1 < k < n. Recall that 

(B-40) 

Ee^ [Ve (log 5^ {Yk\Xk) qe {Xk-i,Xk)) 1^1; J 

_ ^x^'^e (logSe {Yk\xk) qe {xk-i,Xk)) OILi ide i^ilxi) qe {xi-i,Xi)) nidxi-n) 



Ix" nr=i {de (^iki) qe {xi-i,Xi)) n{dxi;n) 



and 



(B-41) 

Ee [Ve {\ogge {Yk\Xk) qe [Xk^uXk)) 
_ jx^'^s (log^e {Yk\xk) qe {xk^i,Xk)) OILi {9e (^iki) ' 



{Xi-i,Xi)) fj-idxi-n) 



Ix" nr=i iae (^iki) qe {xi-i,Xi)) n{dxi.,n) 

Further we have by (B-30) and the accompanying comments that we can use 
the Lebesgue differentiation theorem (see for example [Wheeden and Zygmund, 
1977]) to deduce that for all x € A' that 



(B-42) 



Ve^e (^fck) Ve9e {Yklx) , gg (Yk\x) ge {Yk\x) 



Fq* a.s.. It now follows from assumptions (A2) and (A5), (B-30) etc. and 
(B-42) and the dominated convergence theorem that the numerator and de- 
nominator of the quantity in (B-40) converge to respectively the numerator 
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and denominator of the quantity in (B-41). Since by assumption (A4) we 
have that 

„ n 

/ n (90 {Yi\xi) qg {xi-i,Xi)) n{dxi;n) > 

^=l 

Ffj* a.s. we obtain (B-39). □ 
Appendix C: Proofs of Theorems 2, 3 and 4. 

Proof of Theorem 2. It follows immediately from Theorem 1 that the 
first part of Theorem 2 will hold with the set equal to the set of max- 
imisers of l^{6). Note that since I'' (6) is continuous and Q compact 7"^ will 
always be well defined and non-empty. Further, (13) follows from the uniform 
convergence of l'^{6) to l^{6) and the continuity of the surfaces. 

It remains to prove the second part of the theorem. Suppose now that 
VqI^^O*) is strictly negative definite. We have from the last part of Theorem 
1 that 

(C-43) lim lim sup ||V^/'(e) - V^/°(r)|| = 0. 

Equation (C-43) implies that there exists some 6 > such that for suffi- 
ciently small e the surface I'' (6) has at most one local maximum in the 5 
neighbourhood of 9*. The result now follows from (13). □ 



Proof of Theorem 3. Letting the matrix be equal to Vjl''{6*''') it 
follows from Theorem 1 and standard results on the asymptotic normality 
of the MLE (see for example [Douc et al., 2004]) that in order to prove 
Theorem 3 it is sufficient to show that for e sufficiently small there exists 
some strictly positive definite matrix such that 

(C-44) -^Velogpl.4Yu...,Yn)^N{0,J,) 
and 

(C-45) J, ^ / 



as where / = V^/°(r). 
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We begin by proving (C-44). We have by (B-31) and (B-33) that 
1 



n 



v,iogp^*,(Fi,...,y„) 



(c-46) = ^ E {y-o.:) + ^ E (^1-) - 



(y 



e V-" —oo:il 



where hg,,e{Yi-i) is as defined in (B-32). We note that it fohows from (A-22) 
that one can use similar arguments to those used to deduce (B-31) to show 
that 



(C-47) 



Ea* 



sup 

k>n 



It then foUows from (C-47) that 

Ee* [hl,,.{Y^n:i)\Y^oo:(^] ^ Eq* [i?e*4^-oo;.) I >^-oo;o] 

and hkewise for conditional expectations w.r.t. (T(y„oo:-i) and hence by 
(A-23) that there exists some K such that 



(C- 



Eg* 



\Ee* [Rl*,.{Y^oo:)\Y^oo:0] - Eq* [R'e*.{Y^oo:i)\Y-oo:^lW < Kp' 



for all i. Equation (C-48) immediately implies that the sequence of random 
variables Rgt,^^{Y-cx:,:o) , Rg»,e{Y-oo:i) , ■ ■ ■ satisfies the conditions of Theorem 
5 in [Volny, 1993] and hence we have that 



(C-49) 
where 
(C-50) 



= lim Eq* 

n— >oo 



1 " 

, «=1 



Finally we note that it follows from (B-33), Markov's inequality and the 
Borel-Cantelli lemma that 



(C-51) 



1 



h%*,,{Yr■.^)-R%,Ay-oo■.^)\ > To i.O 



0. 
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Equation (C-44) now follows from (C-49) and (C-51). 

To complete the proof of the theorem it remains to prove (C-45). It im- 
mediately follows from (B-32), (C-47) and (C-50) that for all e 



(C-52) 



Je = lim Eg* 



-Velogpl*4Yi,...,Yn)' 
n 



where the convergence is uniform in n. Next we note that by a simple ap- 
plication of the Fisher identity (see for example [Douc et al., 2004]) that 

Ee* [Vl\ogpe*{Y^^...,Yn)] = Eg* [Velogpe*(Yi, . . . ,Ynf] 

and thus by (B-37) and Lemma 7 that 



(C-53) 



lim Eq* 



-v,iogp^.(yi,...,y„) 



n 



In order to complete the proof of (C-45) it is thus sufficient, by (C-52) and 
(C-53), to show that for all n 



(C-54) 



lim Eq* 

e— >oo 



-Velogpl,,4Yi,...,Yn 

n 



Ee* 



-VelogP0*{Yu...,Ynf 

n 



Finally we note that (C-54) can be proved in exactly the same way as (B-42) 
in the proof of Lemma 8. In order to this we need to show that 



(C-55) 



Ve^e*,. (Yk\x) Vege* (Yk\x) , gg*,, (Yk\x) 



(Yk\x) 



as e — 7- 0. However (C-55) follows from (B-42) and the fact that by assump- 
tions (A2) and (A6) we have that ¥g* a.s. the functions 'Vggo (Yk + z\x) and 
ge O^k + are uniformly Lipschitz (as functions of 9) for all z G Bq. □ 



Proof of Theorem 4. The proof of this result follows from standard 
Bernstein- Von Mises type arguments, see for example [Borwanker et al., 
1971]. □ 

Appendix D: Proofs of Theorems 5 and 6. A central role in the 
proof of Theorem 5 will be played by the following time inhomogeneous 
versions of the perturbed HMM (10). 

Suppose that one has a collection of HMMs parametrised by some pa- 
rameter vector 9 G Q and that for each value of 9 the conditional laws 
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and transition kernels of the corresponding HMM have densities ge{y\x) 
and qg{x,x') respectively. Given some 9 £ Q and e > define the HMM 

(D-56) 

,Y^'^=Xk,Yk for ah A: < 0; Xl'+ ,Y^^+ = Xk,Yk + eZk o.w. 

where {-^fc, is the original HMM and {Zfe}^>Q is a collection of 

i.i.d. U^i random variables. Similarly define the HMM y^*"'"! by 

(D-57) 

' Yk'~ = Xk , Yk for ah A; < 0; X^J" , Y^'- =Xk,Yk + eZk o.w. . 

Clearly the transition kernels of the HMMs (D-56) and (D-57) are equal to 
qg {x,x') and the conditional densities of the observed state are equal to 

(D-58) 

and 

(D-59) gl'- iy\x) 

respectively. 

Let Pe^,+ {- ■ ■ ), P0^,+ {■), Eq,,+ [•], EQe,+ and {■[) and Pe^,-{- ■ ■ ), 
Pge,- (•), Egt,^ [•], Ege,^ [•!•] and P^e,- (-j-) denote the likelihood functions, 
laws and expectation, conditional expectation and conditional probability 
operators w.r.t. to the laws of (D-56) and (D-57). It follows by definition 
that 
(D-60) 

pe{yi,---,yn) =Pe^,+ (Y_n+i = yi,...,lo = Vn) 
Peivi^ ■ ■ ■ iVn) =Pe^,-{YQ = . . . , y„_i = y„) 

P0.,+ {Y-k+l = y-k+l, ■ ■ ■ ,Yn = Vn) = Pe--.-{Y-k = y-k+l, ■ ■ ■ ,Yn-l = Vn)- 

Recall that by (B-31) and (B-33) we have that 
(D-61) 

Vgl (9) - Vgl' {9) 

= hm - {Ee* [Ve logpe{Yi, . . . ,Yn)] - Eg^ [Ve logpl{Yi, . . . , y„)]) 



g% (y|x) if /c> 
ge (y|x) otherwise 



g% [y\x) if /c > 
gs (y|x) otherwise 
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and thus by (D-60) and (D-61) that we have the telescoping sum 

1 " 

V,/ (9) - V^r {9) = hm - V (Ee, [Vg logp,.,+ {Y_n+^, • • • , Y,.^)] 

n— >oo n ^ — ' V 

1=1 

(D-62) -Eg, [Vg logpg.,^ {Y.n+^, Yi.^l)] 



We now note that Theorems 5 and 6 follow immediately from (D-62) and 
the following lemma 

Lemma 10. Suppose that assumptions (A2)-(A7) hold for a collection 
of HMMs parametrised by some vector 6 £ Q. Then there exists a finite 
constant K such that for all e > and integers k, n 



(D-63) Eg, [Vglogpg.,+ {Y.k,...,Yn)]-Eg* [Vglogpg..-{Y.k, ■ ■ ■ ,Yn)] 

< Ke. 



Furthermore suppose that v is Lebesgue measure and that assumption (A8) 
also holds. Then 



(D-64) Eg, [Vg \ogpg.,+ (F^fc, . . . , ^n)] - Eg, [V g \ogpg.,- (F.fc, . . . , Y^)] 



Proof. We shall prove only the first part of the lemma, the proof of the 
second part being almost identical. Clearly analogous expressions to (A-24) 
hold for the HMMs (D-56) and (D-57) and thus in particular we have that 
the term on the left hand side of (D-64) is bounded by 



Eg* 



Eg..+ \Vg log gl'+ (YilXi) qg (X,_i, Xi) 



Y_ 



(D-65) 



^ Eg.,- fVglog^e;, {nX,)qg{X^_uX^ 



k:n 



k:n 



where g^'^ and g^- are as in (D-58) and (D-59). Using the identity 
Vggg{Yo\Xo) Vggl{Yo\Xo) _ fVggg{Yo\Xo) Vggl{Yo\Xo] 



gg (YolXo) g'g (lol^o 



(Yo\Xo) ge{Yo\Xo) 



+ 



Vggl {Yo\Xo) (gl {Yq\Xo) - gg (FolXo) 



9B {Yo\Xo) 



APPROXIMATE BAYESIAN COMPUTATION 
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it is clear from (B-29) and assumptions (A6) and (A7) that there exists some 
K' such that 



(D-66) 



Egi,+ 

E, 



Vgloggl'^ {Y,\Xi)qe{Xi^uXi) 



k:n 



Ve log {nx^)qe (X,_i,XO 



< K'e. 



It then follows from the definitions of g^'^ and g^^ and from assumptions 
(A2)-(A6) that in order to derive (D-64) from (D-65) it is sufficient to show 
that for all i,k,n 

(D-67) 



9e,- [Xi-l, Xi\Y_ 



k:n I 



\TVi 



< K"ep\'\ 



Eg* [||lPe<!>+ {^i-l:^i\Y-k:n) 

for some K" . 

We first note that it follows from standard results concerning uniformly 
mixing Markov chains, see for example [Cappe et al., 2005, Del Moral, 2004] 
that there exist some K and < p < 1 such that for all i and x,x' £ X one 
has that 

(D-68) 



{Xi.uXi\Xo = x)-¥g.,+ (X,_i,X,|Xo = X') 



TV 



\i\-i 



Since by definition we have that the marginal laws of Pge,+ (i^-oo:-i;i;oo) and 
PgE,- (l^-oo:-i;i:oo) are cqual it follows that in order to prove (D-67) it is 
sufficient to only prove it for the case i = 0. 

To prove (D-67) for i = we shall make use of the following simple 
identities. For any G Lqo 

_ Eg.,+ [(l){Xo)ge {Yo\Xo) |roo;-i;i:oo] 



(D-69) 



Eg.,+ mXo)\Y^.,, 

Eg.,- [0(Xo)|yoo;o 



Eg.,+ [ge (^o|-^o) |^oo:-l;l:oo] 
Eg.,- [(l){Xo)gl {Yo\Xo) |yoo:-l;l. 



Eg.,- [gl {Yo\Xo) |yoo:-l;l:. 

It then follows from (D-69) using basic algebra that 



sup Eg* 

ll«ilL <i 



Egc-i 



Eg.,+ [</.(Xo)|yoc 

^30: — l;l:ooJ 



Eg.,- [0(Xo)|yoo;oc 
Eg^,~ [\ge - gg\ |yoo:-l;l:oo] 



(D-70j < Eg* -— h , I , 

i^e^'+ [ge\^ oo:-l;l:oo\ ^9^ - [96 | oo:-l;l:ooJ 

The result now follows follows immediately from (D-70) and assumption 
(A7). □ 
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